
# README: HLS-CMDS Dataset
**Heart and Lung Sounds Dataset Recorded from a Clinical Manikin using Digital Stethoscope (HLS-CMDS)**

## Dataset Overview
This dataset contains 535 recordings of heart, lung, and mixed cardiopulmonary sounds, captured using a 3M Littmann Core digital stethoscope from a CAE Juno clinical manikin. The dataset includes both individual and mixed recordings, covering normal and abnormal sounds, collected from various anatomical chest locations.

It is intended for use in artificial intelligence (AI), machine learning, signal processing, and clinical research, particularly for automated cardiopulmonary disease detection, sound classification, source separation, and deep learning algorithm development.

## Dataset Composition
- **Heart Sounds (HS):** 50 recordings
- **Lung Sounds (LS):** 50 recordings
- **Mixed Heart and Lung Sounds (Mix):** 145 mixed recordings + 145 corresponding heart sources + 145 corresponding lung sources → total 3 × 145 files

## File Structure
- `HS.zip` → contains heart-only recordings
- `LS.zip` → contains lung-only recordings
- `Mix.zip` → contains mixed recordings, along with their source-separated heart and lung files
- `HS.csv`, `LS.csv`, `Mix.csv` → metadata files providing details for each recording

## Metadata Fields
The accompanying CSV files include:
- **Gender:** F (female), M (male)
- **Auscultation Location:**
  - Heart: Apex (A), Right Upper Sternal Border (RUSB), Left Upper Sternal Border (LUSB), Left Lower Sternal Border (LLSB), Right Costal Margin (RC), Left Costal Margin (LC)
  - Lung: Right Upper Anterior (RUA), Right Mid Anterior (RMA), Right Lower Anterior (RLA), Left Upper Anterior (LUA), Left Mid Anterior (LMA), Left Lower Anterior (LLA)
- **Sound Type:**
  - **Heart Sounds:**
    - NH: Normal Heart
    - LDM: Late Diastolic Murmur
    - MSM: Mid Systolic Murmur
    - LSM: Late Systolic Murmur
    - AF: Atrial Fibrillation
    - S4: Fourth Heart Sound
    - ESM: Early Systolic Murmur
    - S3: Third Heart Sound
    - T: Tachycardia
    - AVB: Atrioventricular Block
  - **Lung Sounds:**
    - NL: Normal Lung
    - W: Wheezing
    - FC: Fine Crackle
    - R: Rhonchi
    - PR: Pleural Rub
    - CC: Coarse Crackle
- **Sound ID:** Name of the `.wav` file

## Technical Details
- **Format:** `.wav` audio files
- **Sampling Rate:** 22,050 Hz
- **Duration:** 15 seconds per recording
- **Stethoscope Modes Used:** Bell (low-frequency), Diaphragm (high-frequency), Midrange

## Usage Instructions
1. Download and extract all `.zip` files.
2. Use the metadata CSV files to match recordings with gender, auscultation site, sound type, and source.
3. Process `.wav` files using your preferred audio analysis or machine learning pipeline.
4. For supervised tasks, use the provided class labels; for unsupervised tasks, explore the mixed-source recordings for blind source separation.

## Citation
If you use this dataset, please cite:
Y. Torabi, S. Shirani, and J. P. Reilly, “Descriptor: Heart and Lung Sounds Dataset Recorded from a Clinical Manikin using Digital Stethoscope (HLS-CMDS),” in IEEE Data Descriptions, https://doi.org/10.1109/IEEEDATA.2025.3566012.